Associations between long-term exposure to air pollution and kidney function utilizing electronic healthcare records: a cross-sectional study

Background Chronic kidney disease (CKD) affects more than 38 million people in the United States, predominantly those over 65 years of age. While CKD etiology is complex, recent research suggests associations with environmental exposures. Methods Our primary objective is to examine creatinine-based estimated glomerular filtration rate (eGFRcr) and diagnosis of CKD and potential associations with fine particulate matter (PM2.5), ozone (O3), and nitrogen dioxide (NO2) using a random sample of North Carolina electronic healthcare records (EHRs) from 2004 to 2016. We estimated eGFRcr using the serum creatinine-based 2021 CKD-EPI equation. PM2.5 and NO2 data come from a hybrid model using 1 km2 grids and O3 data from 12 km2 CMAQ grids. Exposure concentrations were 1-year averages. We used linear mixed models to estimate eGFRcr per IQR increase of pollutants. We used multiple logistic regression to estimate associations between pollutants and first appearance of CKD. We adjusted for patient sex, race, age, comorbidities, temporality, and 2010 census block group variables. Results We found 44,872 serum creatinine measurements among 7,722 patients. An IQR increase in PM2.5 was associated with a 1.63 mL/min/1.73m2 (95% CI: -1.96, -1.31) reduction in eGFRcr, with O3 and NO2 showing positive associations. There were 1,015 patients identified with CKD through e-phenotyping and ICD codes. None of the environmental exposures were positively associated with a first-time measure of eGFRcr < 60 mL/min/1.73m2. NO2 was inversely associated with a first-time diagnosis of CKD with aOR of 0.77 (95% CI: 0.66, 0.90). Conclusions One-year average PM2.5 was associated with reduced eGFRcr, while O3 and NO2 were inversely associated. Neither PM2.5 or O3 were associated with a first-time identification of CKD, NO2 was inversely associated. We recommend future research examining the relationship between air pollution and impaired renal function. Supplementary Information The online version contains supplementary material available at 10.1186/s12940-024-01080-4.


Introduction
Chronic kidney disease (CKD) is a prevalent and growing health concern.Globally, CKD resulted in 1.2 million premature deaths in 2017, with an estimated prevalence of 697.5 million [1,2].Annual mortality is expected to increase 2.2-4.0 million by 2040 [2].More than 38 million people in the United States live with CKD, most of whom are over the age of 65 [3,4].Following worldwide trends, the prevalence in the US is expected to increase in the coming decades; for those over 65 years of age, the prevalence is expected to increase by 37.8% by 2030.Older adults often experience higher levels of difficulty with both the successful management of CKD and, once the disease has progressed to end-stage kidney disease (ESKD), access to the resources necessary for kidney transplantation [5].
CKD is broadly defined by the presence of an estimated glomerular filtration rate (eGFR) of less than 60 mL/min per 1.73 m 2 , markers of kidney damage such as albuminuria or hematuria, or both for a duration of > = 90 days, or the need for kidney replacement therapy [6].Even a moderate decrease (eGFR of 59 to 30 mL/min per 1.73 m 2 ) in kidney function increases the risk of hospitalization [7].There are five stages of CKD, with stage 1 being the mildest and stage 5 indicated either severe impairment (eGFR < 15 mL/min per 1.73 m 2 ) or kidney failure.Given the relatively mild symptoms of mild to moderate decreased kidney function, most individuals are not aware when they are in the first few stages of CKD.Prevalence of early CKD appears to be higher in females, but males progress more quickly through the disease stages and have a higher risk of mortality [8].While standardized mortality rates for other non-communicable diseases such as cancer and cardiovascular disease have declined, CKD has not seen the same substantial decrease [9].
Most cases of CKD are caused by diabetes, hypertension, or a combination of both conditions, while other less common causes include primary glomerulonephritis, chronic tubulointerstitial nephritis, hereditary disease, secondary glomerulonephritis or vasculitis, etc. [10].These issues are the manifestation of a combination of genetic, behavior, and environmental factors [11].The mechanisms by which these diseases damage the kidneys over time include, but are not limited to, systemic/intraglomerular hypertension, glomerular hypertrophy, precipitation of intrarenal calcium phosphate, inflammation, and altered metabolism [10].This chronic, consistent damage changes the overall architecture of the kidney, leading to scarring, and reduces their ability to function normally.
Recent research suggests environmental exposures as potential factors associated with the onset and progression of CKD in addition to these other factors [12].
Long-term exposure to air pollutants, specifically coarse particulate matter (PM 10 ), fine particulate matter (PM 2.5 ), and nitrogen dioxide (NO 2 ), show a mixed, but overall, consistent relationship with low kidney function [13,14].It is possible that there is translocation of ultrafine particles directly into the bloodstream, oxidative stress responses, or changes in the ratios and total number of immune cells [15,16].Ozone (O 3 ) may impact the kidneys as inhalation induces immunosuppressive and metabolic responses in the kidneys, heart, and liver [17].Animal models suggest that inhalation of O 3 alters gene expression in pathways involving inflammatory signaling, antioxidation, and endothelial function [18].However, few studies have examined the effect of O 3 on kidney disease in humans.
Our primary objective is to examine if airborne exposure to PM 2.5 , O 3 , or NO 2 is associated with (1) reduced renal function as measured by serum creatinine estimated eGFR cr or (2) a diagnosis of CKD in a random sample of patients from the University of North Carolina Healthcare System (UNCHCS).

Study population
We defined the sampling frame for this study to include patients with available electronic health records (EHRs) containing information on kidney health.Specifically, we include those with available serum creatinine laboratory test results and ICD codes relevant to CKD (available in Supplemental Table 1).We then separate this population into two distinct groups.The first group consists of all individuals with reported lab values for serum creatinine.We will use this group to analyze associations between air pollution and eGFR cr continuously.The second group is restricted to patients with two measures of eGFR cr < 60 mL/min per 1.73 m 2 > 90 days apart and/or an ICD code indicating a physician diagnosis of CKD III-V.This group we consider as a positive case for CKD.These positive cases will be matched with controls for analysis.
To accomplish this, we utilize data from EHRs in the in Environmental Protection Agency's Clinical and Archived Records Research for Environmental Studies (EPA CARES) [19,20].Our sampling frame from the EPA CARES population is a random sample of 19,989 individuals (504,406 unique visits), who were seen at a UNCHCS affiliated hospital or clinic from January 1st, 2004, to December 31st, 2016.Any participants with implausible demographic information (e.g., older than 110 years, BMI above 50, etc.) were removed prior to any analysis as it is likely these values were introduced during errors in entering information in electronic health records.Additionally, we removed any individuals who did not reside in North Carolina (n = 403).For both groups (continuous outcome and binary), we linked the 1-year average PM 2.5 , O 3 , and NO 2 prior to the date of the (group 1) serum creatinine laboratory tests or (group 2) the second eGFR cr value < 60 mL/min per 1.73 m 2 or ICD code, whichever occurs earliest.

Assessment of renal function
For the eGFR analyses, we use serum creatinine levels to assess kidney function as our first outcome.We estimated eGFR cr using the 2021 CKD-EPI equation for serum creatinine: This equation was updated in 2021 to no longer include race in estimates of eGFR cr .Here, Scr is serum creatinine in mg/dL, κ is 0.7 for females and 0.9 for males, α is -0.241 for females and -0.302 for males, the min and max represent the minimum or maximum of the specified measurement or 1 [21].We used the Tukey method to remove outliers, eliminating serum creatinine values more than 1.5 standard deviations above Q3 or those 1.5 standard deviations below Q1 before we calculate eGFR cr [22].Information on UACR and serum cystatin C were not included in this analysis as they were not available in the data.The analyses focused on eGFR cr as a continuous outcome includes all recorded measures.Individuals may be included multiple times in the same dataset.
For our second outcome of interest, we designate an e-phenotype using similar methods described in previous research focused on kidney health utilizing EHRs [23,24].We consider a positive case of CKD stage III-V if a patient presents with two eGFR cr measures < 60 mL/ min per 1.73 m 2 greater than 90 days apart or has an ICD code indicating physician diagnosis.If patient data contains both types of diagnoses, we take the earliest diagnostic date.If a patient only has eGFR cr measures, we take the second measure as the diagnostic date.ICD-9 codes include 585.3 -585.6 and ICD-10 codes include N-18.3 -N18.6.We use this method as many people living with CKD are unknowingly living with CKD and may not be diagnosed by a physician.Using similar methods, Paik et al. 2021 achieved positive predictive values > 80% [23].One-year annual air pollutant averages for the preceding 365 days are linked to the exact serum creatinine laboratory date as exposures.

Matching identified CKD cases and controls
For our CKD analyses, to ensure that our sampling was robust against bias, we matched each case to four controls (1:4) who were never diagnosed or identified as having CKD by e-phenotyping.We performed this matching eGFR cr = 142 × min(S cr /κ, 1) α × max(S cr /κ, 1) -1.200 × 0.9938 Age × 1.012 [if female] using the 'MatchIt' package in RStudio.This package allowed for matching cases and controls based on designated input variables to produce more robust results with less sensitivity to assumptions.We matched according to propensity scores based on diagnosis date, age, race, and sex.For controls, who do not have a diagnosis date, we match on the closest hospital visit (Supplementary Fig. 2).We matched on the 'optimal' controls using the propensity score generating by matching variables.To ensure that we were not selecting matches from different geographic regions of the state, potentially introducing confounding, we compared cases and control percentages taken from the eight climate divisions of the state (see Supplementary information) [25].This matching was only done for patients with identified CKD.We then calculate differences in dates between cases and controls to ensure that we are sampling from similar time frames.

Exposure assessment
For PM 2.5 and NO 2 data, we used an ensemble model constructed by Di et al. that incorporates satellite aerosol measures, land-use regression, chemical transport models, and meteorological data [26].This model incorporates three machine learning algorithms that predict pollutant concentrations in 1 × 1 km grids for the entirety of the contiguous Unites States.This model has been cross validated with an R 2 of 0.89 (for the US Middle Atlantic Region) and shows accurate performance up to concentrations of approximately 60 µg/m 3 or less [26].The CARES patient data has the primary addresses of patients which we link to the appropriate 1 × 1 km grid.Where primary addresses were not successfully geocoded, we matched patients to the 1 × 1 km grid cell of the centroid of their primary residence ZIP code.O 3 data come from the 12 km 2 Community Multiscale Air Quality Modeling System (CMAQ) model; specifically, we use averaged 8-h maximum concentrations for O 3 and averaged 24-h for NO 2 [27].CMAQ utilizes hourly measured pollutant data along with meteorological information to estimate pollutant concentrations at the census tract level.For all three pollutants, we estimate annual averages for all included patients.

Covariates
We chose covariates based on previously published research examining associations between air pollution and renal function [28].We include individual-level sociodemographic information of age, race (Caucasian, African American, other), and sex as factors.We created the 'other' race category as there were too few patients of other racial backgrounds that were not Caucasian or African American to include separately in models.Clinical diagnosis of both diabetes and hypertension were included in descriptive statistics based on ICD-9 and ICD-10 codes (250.x and E11.x for diabetes and 401.9 and I10 for primary hypertension) (full list of ICD codes available in Supplementary information).However, these were excluded from models as they are both likely mediators of kidney function and onset of CKD.Event-specific instances of these diseases (e.g., pregnancy induced hypertension) were not included in this analysis.We adjusted for the following 2010 census/2013 5-year ACS variables at the block group level: income, percent older housing (built before 1979), percent living in poverty, urbanicity, and percent of the population on public assistance, all as continuous covariates.Education (percent with a bachelor's degree or higher) and median price of housing were included in descriptive statistics but excluded from final models due to high collinearity (r >|0.7|) with income.Lastly, climate zones (identified from climatechange.nc.gov) were included as regional adjustment for unmeasured factors that differ between regions in North Carolina as a factor in our models.Smoking status and body mass index (BMI) were not included in the main analyses as they were not recorded for a large portion of patients and only reported as secondary analyses.

Statistical analyses
We analyzed associations between eGFR cr and air pollutants using linear mixed models, presenting unadjusted and fully adjusted models, with a random intercept for patient ID.We first calculated descriptive statistics for patients.Following this we calculated Pearson correlations between PM 2.5 , O 3 , and NO 2 to examine the relationship between the exposures of interest.To make exposures more comparable we then calculate interquartile range (IQR) for use in the models.We controlled for the continuous census block group covariates including average income, percent older housing (built before 1979), percent living in poverty, urbanicity, and percent of the population on public assistance.Demographic covariates included age, sex, and race.As patients were more likely to be sampled from geographic regions closer to hospitals near the flagship UNC Chapel Hill hospital, we control for the climate zones (as identified by the NC Climate Division, map available in Supplementary information) in North Carolina.There are eight climate zones in North Carolina, however due to too few observations we only include zones 3-8 in our analyses (n = 15 patients removed).
We calculated eGFR cr as a continuous outcome along with serum creatinine pre-transformation as a secondary outcome.1-year average concentrations for PM 2.5 , O 3 , and NO 2 were matched to the date the laboratory test for serum creatinine was completed.We then calculated IQRs for each pollutant during the 1-year period to make them more comparable.Our fully adjusted models included age and race, census block group information (median income, % older housing, % poverty, urbanicity, % on public assistance), geographic region, and exposures (PM 2.5 , O 3 , and NO 2 ) along with unrestricted natural cubic spline adjustment for long-term temporal variations with the number of splines based on the Aikake information criteria.We present only the results of multipollutant models, information on single pollutant models is available in Supplementary table S3.
We conducted unconditional multiple logistic regression to estimate odds ratios between first indication date of CKD and air quality for 1-year prior to diagnosis comparing our cases and controls (results of conditional are available in Table S4) [29].The census block group, demographic, and comorbidity covariates included in our multiple logistic regression models were the same as those included in the linear mixed models.All analyses and visualizations were completed using SAS software version 9.4 and RStudio 4.0.3[30,31].In RStudio we used the package 'matchit' for matching cases and controls for our multiple logistic regression models [32].

Sensitivity analysis
Body mass index (BMI) and smoking status were not reported for all patients in the random sample and were used in secondary analyses to ensure that their inclusion did not alter the linear mixed models described previously.BMI is available for n = 18,639 (n = 4,834 patients) serum creatinine lab measures and smoking status is available for n = 30,913 (n = 5,532 patients).Smoking status was separated into five categories including current, current/former, former, never/former, and never.Smoking status was attached to the same day serum creatinine tests were taken, or if it was not assessed that day, then the nearest prior date where smoking status was available.We ran two additional fully adjusted models with the same covariates in addition to BMI (continuous) and smoking status.For both analyses including BMI and smoking status, we calculate associations with and without the additional confounder to ensure that differences seen are not driven by underlying characteristics of the new sampling frames.We ran the fully adjusted models comparing cases of CKD to the entire random sample (available in Supplementary data).For patients without CKD, we ran two models, attaching an exposure date as both first appearance in the hospital system and median visit to ensure that results were consistent at different time points.Lastly, we include only those with streetlevel geocoded addresses (some patients were coded at zip code) to ensure the most accurate assignment of exposures.

Stratified analysis
We also stratified by individuals who were exposed to 1-year PM 2.5 averages ≥ 12 µg/m 3 and those < 12 µg/m 3 [33].This threshold was chosen based on current (2022) National Ambient Air Quality Standards (NAAQS) primary standard for PM 2.5 .To estimate associations of air pollution on patients with low functioning kidneys we stratify by those patients with a measure of an eGFR cr < 60 mL/min/1.73m 2 .

Results
There were N = 7,722 patients with available serum creatinine to calculate eGFR cr and exposure data linked to primary address.Within this group there were N = 44,486 serum creatinine tests available during our study period (Tables 1 and 2).BMI is available for n = 18,639 (n = 4,834 patients) serum creatinine lab measures and smoking status is available for n = 30,913 (n = 5,532 patients).Patients with serum creatinine measures average 53.6 (SD: 17.9) years of age, are majority female (58.2%), and majority Caucasian (64.7%).The prevalence of diabetes and hypertension in this group were 25.6% and 57.6% respectively.This group was, on average, exposed to 1-year median concentrations at 9.52 (IQR: 1.57) of PM 2.5 µg/m 3 , 39.7 ppb O 3 (IQR: 3.21), and 12.8 ppb NO 2 (IQR: 8.68).
We included N = 4,952 patients in our case-control sample that captures all patients identified with CKD and corresponding controls.Within this group, we identified 1,015 patients with severely limited kidney function (ICD code or two eGFR cr < 60 mL/min per 1.73 m 2 ), and 3,937 non-CKD patients as controls.Those with CKD were more likely to be diagnosed with diabetes and/or hypertension.Patients with CKD had patterns of lower block group SES status as indicated by higher percent poverty, lower average income, and median house value.However, there were few differences between block-level percentage poverty or those on public assistance.Those diagnosed with CKD were exposed to lower 1-year median concentrations of PM 2.5 (11.2 S2).All our SMDs between cases and control were less than 0.1, indicating adequate balancing, with the exception of African American patients (SMD = 0.11) and Other Race (SMD = -0.11)and may limit the interpretability of the results in these cases.

Discussion
In this study, we examined the relationship between 1-year average PM 2.5 , O 3 , and NO 2 concentrations with kidney function as measured by eGFR cr and first indication of CKD.Only increases in 1-year mean concentrations of PM 2.5 were associated with a decrease in eGFR cr while both O 3 and NO 2 were not associated.The trends seen for eGFR cr were similar to the associations between the three air pollutants and serum creatinine prior to   These results support other epidemiologic studies that report an inverse association between IQR increases in PM 2.5 and eGFR cr [34][35][36].Likewise, observed positive associations between O 3 and eGFR cr has been reported in other studies [37].Weaver et al. report a lack of association between long term O 3 exposure and decreased eGFR in a manner consistent to these results [28].These findings contrast to prior studies that find associations between NO 2 and kidney function as measured by both eGFR and risk of CKD/ESKD [38,39] [38].Though, generally, levels of air pollution in China are higher than the US/Europe, so there may be dose-dependent responses we do not see in this study, particularly as these two studies reported higher concentrations of NO 2 .
Despite higher concentrations of PM 2.5 being associated with lower eGFR cr , we do not see a similar relationship between PM 2.5 and first-time indication of CKD.Inverse associations were seen with O 3 and NO 2 with incident CKD.Prior studies, such as Yang et al., 2022 have reported positive association between O 3 and the prevalence of CKD in a nationwide Chinese study [37].Similar findings in other studies have found no associations between O 3 and incidence of CKD, suggesting that more studies are needed that reflect impacts on the general population [40].O 3 is relatively less studied than other criteria air pollutants in association with CKD [41].It is notable that the majority of studies focusing on air pollution and kidney function/ disease take place in either the United States or Asia or studied a special population such as military veterans [13,35].As such, more research should be conducted to better understand these associations in other geographic regions, cultural and social context, climatic regions, etc.
African American patients were exposed to higher concentrations of PM 2.5 on average than both Caucasian and patients of another race.In our stratified analyses, African American patients were more likely to have lower eGFR cr when compared to Caucasians.These differences in outcome by race are likely a result of social determinants of health impacting disparities in renal health [42].The African American population in the US, relative to other ethnicities, make up a disproportionately large percentage of those with CKD [43].It is important for future research to investigate other potential environmental, Fig. 2 Results of logistic regression models examining the associations of air pollutants with CKD social contributors, and interactions between these to these health disparities.
We further conducted stratified analyses by PM 2.5 concentrations, age, and limited analyses to those with streetlevel geocoded addresses.We found the associations between PM 2.5 and decreased in eGFR cr were decreased at concentrations < 12 µg/m 3 but increased at higher levels.This particular result needs further investigation as these findings do not directly support the growing body of evidence that even at lower concentrations (e.g., US NAAQS standards), air pollution has deleterious impacts on health [44][45][46].On the impacts of age, the estimate for PM 2.5 and eGFR cr was stronger for those < 65 years of age, with weaker associations observed for older patients.This may be a result of older patients being more likely to be on anti-hypertensive medication or medication for diabetes.After stratifying by CKD status, or those not identified with CKD, the association with O 3 is inverse.For individuals diagnosed with CKD this may be that, given the high percentage of comorbidities, the impact of air pollution may be exacerbated by directly impacting regulation of both blood sugar and pressure [47].
A limitation of this study is that the patients included in these analyses resided predominantly in central North Carolina, where the majority of UNCHCS affiliated hospitals or clinics are located.Due to relative underrepresentation our African American and Other Race patients were not matched to the level of our Caucasian patients, which could introduce bias.With address geocoding there is always the possibility of misclassification that cannot be assumed to trend towards the null.As such this study may lack generalizability to the general population.Using 1-year average air pollution concentrations does not capture the entirety of the time air pollution potentially impacted kidney health.Census block group level covariates do not necessarily capture individual SES, which ideally would have been at the individual level; there could be residual confounding by SES.Finally, it is likely that serum creatinine was measured more often for patients with suspected renal dysfunction, biasing the sample towards those with already impaired kidney function.Unexpected directionality arose concerning associations between NO 2 and kidney function in our analyses.This is likely due to additional, unmeasured confounding, dose-dependent effects, geographic proximity to roadways, etc. that were not addressed in the scope of this work.This is an area we recommend additional research be focused.Further, there is the possibility that there is the possibility that there is a time x exposure interaction that was not accounted for in this current study.Lastly there are limitations when using EHR data such as representativeness, the data available, missing or incorrectly entered measures, etc.
One of the strengths of this study is that it takes a random sample of patients visiting the North Carolina healthcare system and does not focus on a special subpopulation.This random sample has near complete clinical phenotyping and well validating air pollution modeling estimates, matched with high precision geocoding.Additionally, by utilizing e-phenotyping of CKD, we may be more accurately estimating associations between air pollution and reduced renal function as CKD is often not diagnosed until the latter stages.
In conclusion, we observed reduced renal function, as measured by eGFR cr , with 1-year concentrations on PM 2.5 , but not with O 3 or NO 2 .No exposures were associated with increased odds of CKD, while NO 2 was inversely associated.This study provides further evidence that long-term exposure to fine particulate matter is associated to reduced renal function and may contribute to adverse outcomes.

Fig. 1
Fig. 1 Results of linear mixed models examining the associations of air pollutants with eGFR cr . Li et al., 2021 examined a smaller population (n = 169) of older adults, while Liang et al. 2021 reported the results of a large (n = 47,086) nationally representative sample.It is possible that there is a relationship between NO 2 and decreased renal function, but 1-year concentrations are not long enough to capture the relationship.NO 2 is very source dependent, and future work may want to investigate those living near roadways, as this is a major source of NOx exposure in the United States.In a nationwide cross-sectional study in China, Liang et al. show an increased risk of developing CKD with longer term exposure to higher concentrations of NO 2 , with the risk being greatest at 5-years of exposure (for example Liang et al. and Li et al. report median concentrations for NO 2 at approximately 24 and 23 ppb respectively)

Table 1
Descriptive characteristics for the subset of NC-CARES with (1) available eGFR cr values and (2) cases and controls for patients diagnosed with CKD Data for block groups come from the 2010 US Census/2013 5-Year ACS; any category that does not sum to 100% is a result of rounding; Education is measure by percentage with a bachelor's degree or higher; older housing refers to the percentage of houses built before 1979 *

Patients with eGFR cr N = 7,722
the entire patient group with an estimate of -1.57(95% CI: -1.91, -1.23).For this group both O 3 and NO 2 were not associated with eGFR cr (available in Supplementary information).

Table 2
Pearson correlations of PM 2.5 , O 3 , and NO 2 for both the linear mixed and the multiple logistic regression models

Table 3
Results from mixed linear models & logistic regression of 1-year PM 2.5 , O 3 , NO 2 and kidney function among a random sample of NC CARES serum creatinine laboratory results (N = 44,486) Estimate with only random intercepts for patients and spline adjustment for temporal variation b Model 2 fully adjusted linear mixed model estimate for temporal variations, age, sex, race, comorbidities, and census block group c IQR for exposures in our eGFRcr analysis are: PM 2.5 -1.43 µg/m 3 ; O 3 -2.81ppb; NO 2 -8.49ppb d IQR for exposures in our CKD analysis are: PM 2.5 -3.39 µg/m 3 ; O 3 -3.36ppb; NO 2 -10.45 ppb a Model 1